-
Notifications
You must be signed in to change notification settings - Fork 294
Introduce ExecutionMode.WATCHER_KUBERNETES to use watcher with KubernetesPodOperator
#2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
25bc89c
Refactor watcher operators so the base definitions can be reused by t…
tatiana 7c96d1b
Fix unittests
tatiana a993b0d
Improve test coverage
tatiana f1a4e28
Introduce Cosmos Watcher K8s mode
tatiana 0c4e67e
Create override class
tatiana ef4a20a
Add patched pod manager
tatiana e8c5ab9
Make event capture in execute_complete consistent
tatiana 4e27db3
Use latest version of AF and k8s provider
tatiana 8eedf4e
Attempt to make compatible with older versions of the K8s pod manager
tatiana aa0f1e6
Attempt to make compatible with older versions of the K8s pod manager
tatiana 77a27ce
Try to fix integration tests
tatiana 740abea
Fix K8s provider version check
tatiana 0ab7576
Revert change to example k8s dag
tatiana 43c33ca
Improve K8s integration tests
tatiana 86db5f3
Fix running k8s watcher mode
tatiana 2515e45
Fix running k8s watcher mode
tatiana 7e9e638
Try to fix CI issue
tatiana e035916
Try to fix CI issue
tatiana 73f87ee
Fix version of protobuf so is compatible between Airflow and dbt in K…
tatiana 1298fcc
Add missing DAG
tatiana cf9f487
Do not run the k8s dag when it's not desired
tatiana 67e170c
Release 1.13.0a1
tatiana c924e53
Release 1.13.0a1
tatiana b625a24
Improve log message when trying to poke XCom
tatiana 2e9964f
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] bcb07b1
Fix path to triggerer
tatiana 846962d
Improve watcher code coverage
tatiana 239f022
Adjustments after rebase
tatiana 52ae820
Clean up tests
tatiana 7ab87b8
Fix import path
tatiana fb0cf72
Address feedback from @johnhoran in https://github.com/astronomer/ast…
tatiana ef7bde1
Replace Cosmos K8s Pod operator fix by the one proposed by @johnhoran…
tatiana 5a6012d
Improve test coverage
tatiana 224434c
Adjust watcher k8s to use base class
tatiana 5ccb1bf
Fix logging
tatiana d407ae2
Fix unittests after rebase based on
tatiana 9fdf602
Fix type error
tatiana 4ed2bf6
Fix import error
tatiana db2034b
Tell codecov to ignore file that is from Airlfow code-base
tatiana 8e3d400
Improve test coverage of cosmos/operators/_watcher/base.py
tatiana 37f1372
Improve test coverage of AFTER_ALL with KUBERNETES_WATCHER
tatiana 072bbaa
Add tests for DbtBuildWatcherKubernetesOperator
tatiana 57a07bf
Add test for validating the K8s watcher consumer behaves as a sensor …
tatiana 544643a
Reduce create_test_task_metadata function complexity
tatiana 13fd0d2
Fix test name
tatiana e9a94d3
Add tests related to watcher k8s retries
tatiana 989e909
Address PR feedback
tatiana 62d87bc
Address PR feedback https://github.com/astronomer/astronomer-cosmos/p…
tatiana b5bfce0
Apply suggestion from @Copilot
tatiana a6f3fce
Revert log message
tatiana ba7f0dc
Address copilot feedback
tatiana 19903ea
Update dev/dags/jaffle_shop_watcher_kubernetes.py
tatiana 784bccd
Update tests/operators/test_watcher_kubernetes.py
tatiana 77f44db
Merge branch 'main' into k8s-watcher-mess
tatiana 7bfa62f
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] a56f8d4
Fix tests after rebase
tatiana ad41f97
Fix tests with old function name
tatiana a6a8c1b
Revert unnecessary changes to jaffle_shop_kubernetes DAG
tatiana 48baa8b
Fix after rebase
tatiana 176b0c1
Update tests/test_example_k8s_dags.py
tatiana e7b7a20
Update cosmos/airflow/_override.py
tatiana 17de369
Update cosmos/operators/_watcher/triggerer.py
tatiana 495bafc
Rename _use_event to use_event
tatiana 5ec7e17
Revert from AF 3.1 to 3.0
tatiana 025552a
Merge branch 'main' into k8s-watcher-mess
tatiana b06bacc
Fix running kubernetes watcher unit tests
tatiana c6c156e
Fix running k8s watcher unit tests
tatiana 1f18858
Update .github/workflows/test.yml
tatiana File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ignore: | ||
| - "cosmos/airflow/_override.py" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| import math | ||
| import time | ||
| from collections.abc import Callable | ||
| from datetime import timedelta | ||
|
|
||
| import pendulum | ||
| from airflow.providers.cncf.kubernetes import __version__ as airflow_k8s_provider_version | ||
| from airflow.providers.cncf.kubernetes.callbacks import ExecutionMode | ||
| from airflow.providers.cncf.kubernetes.utils.pod_manager import PodLoggingStatus, PodManager | ||
| from airflow.utils.timezone import utcnow | ||
| from kubernetes.client.models.v1_pod import V1Pod | ||
| from packaging.version import Version | ||
| from pendulum import DateTime | ||
| from urllib3.exceptions import HTTPError, TimeoutError | ||
|
|
||
| from cosmos.constants import _K8s_WATCHER_MIN_K8S_PROVIDER_VERSION | ||
|
|
||
|
|
||
| # This is being added to overcome the issue with the KubernetesPodOperator logs repeating: | ||
| # https://github.com/apache/airflow/issues/59366 | ||
| # It can be removed once it is fixed in the upstream provider. | ||
| class CosmosKubernetesPodManager(PodManager): # type: ignore[misc] | ||
| """Create, monitor, and otherwise interact with Kubernetes pods for use with the KubernetesPodOperator.""" | ||
|
|
||
| def fetch_container_logs( # noqa: C901 | ||
| self, | ||
| pod: V1Pod, | ||
| container_name: str, | ||
| *, | ||
| follow: bool = False, | ||
| since_time: DateTime | None = None, | ||
| post_termination_timeout: int = 120, | ||
| container_name_log_prefix_enabled: bool = True, | ||
| log_formatter: Callable[[str, str], str] | None = None, | ||
| ) -> PodLoggingStatus: | ||
| """ | ||
| Follow the logs of container and stream to airflow logging. | ||
|
|
||
| Returns when container exits. | ||
|
|
||
| Between when the pod starts and logs being available, there might be a delay due to CSR not approved | ||
| and signed yet. In such situation, ApiException is thrown. This is why we are retrying on this | ||
| specific exception. | ||
|
|
||
| :meta private: | ||
| """ | ||
|
|
||
| def consume_logs( # noqa: C901 | ||
| *, since_time: DateTime | None = None | ||
| ) -> tuple[DateTime | None, Exception | None]: | ||
| """ | ||
| Try to follow container logs until container completes. | ||
|
|
||
| For a long-running container, sometimes the log read may be interrupted | ||
| Such errors of this kind are suppressed. | ||
|
|
||
| Returns the last timestamp observed in logs. | ||
| """ | ||
| # Cosmos implementation difference when compared to proposal to fix the issue in the upstream provider: | ||
| # https://github.com/apache/airflow/pull/59372/ | ||
| if Version(airflow_k8s_provider_version) >= Version("10.10.0"): | ||
| from airflow.providers.cncf.kubernetes.utils.pod_manager import parse_log_line | ||
| elif ( | ||
| Version(airflow_k8s_provider_version) >= _K8s_WATCHER_MIN_K8S_PROVIDER_VERSION | ||
| ): # Successfully tested with Airflow 3.1.0 and K8s provider 10.8.0 and 10.9.0 | ||
| parse_log_line = self.parse_log_line | ||
| else: | ||
| raise ValueError( | ||
| f"Unsupported K8s provider version: {airflow_k8s_provider_version}. " | ||
| f"Minimum required version is {_K8s_WATCHER_MIN_K8S_PROVIDER_VERSION}" | ||
| ) | ||
| # Cosmos custom implementation finishes here. | ||
|
|
||
| exception = None | ||
| last_captured_timestamp = None | ||
| # We timeout connections after 30 minutes because otherwise they can get | ||
| # stuck forever. The 30 is somewhat arbitrary. | ||
| # As a consequence, a TimeoutError will be raised no more than 30 minutes | ||
| # after starting read. | ||
| connection_timeout = 60 * 30 | ||
| # We set a shorter read timeout because that helps reduce *connection* timeouts | ||
| # (since the connection will be restarted periodically). And with read timeout, | ||
| # we don't need to worry about either duplicate messages or losing messages; we | ||
| # can safely resume from a few seconds later | ||
| read_timeout = 60 * 5 | ||
| try: | ||
| since_seconds = None | ||
| if since_time: | ||
| try: | ||
| since_seconds = math.ceil((pendulum.now() - since_time).total_seconds()) | ||
| except TypeError: | ||
| self.log.warning( | ||
| "Error calculating since_seconds with since_time %s. Using None instead.", | ||
| since_time, | ||
| ) | ||
| logs = self.read_pod_logs( | ||
| pod=pod, | ||
| container_name=container_name, | ||
| timestamps=True, | ||
| since_seconds=since_seconds, | ||
| follow=follow, | ||
| post_termination_timeout=post_termination_timeout, | ||
| _request_timeout=(connection_timeout, read_timeout), | ||
| ) | ||
| message_to_log = None | ||
| message_timestamp = None | ||
| progress_callback_lines = [] | ||
| try: | ||
| for raw_line in logs: | ||
| line = raw_line.decode("utf-8", errors="backslashreplace") | ||
| line_timestamp, message = parse_log_line(line) | ||
| if line_timestamp: # detect new log line | ||
| if message_to_log is None: # first line in the log | ||
| message_to_log = message | ||
| message_timestamp = line_timestamp | ||
| progress_callback_lines.append(line) | ||
| else: # previous log line is complete | ||
| for callback in self._callbacks: | ||
| callback.progress_callback( | ||
| line=message_to_log, | ||
| client=self._client, | ||
| mode=ExecutionMode.SYNC, | ||
| container_name=container_name, | ||
| timestamp=message_timestamp, | ||
| pod=pod, | ||
| ) | ||
| self._log_message( | ||
| message_to_log, | ||
| container_name, | ||
| container_name_log_prefix_enabled, | ||
| log_formatter, | ||
| ) | ||
| last_captured_timestamp = message_timestamp | ||
| message_to_log = message | ||
| message_timestamp = line_timestamp | ||
| progress_callback_lines = [line] | ||
| else: # continuation of the previous log line | ||
| message_to_log = f"{message_to_log}\n{message}" | ||
| progress_callback_lines.append(line) | ||
| finally: | ||
| # log the last line and update the last_captured_timestamp | ||
| if message_to_log is not None: | ||
| for callback in self._callbacks: | ||
| callback.progress_callback( | ||
| line=message_to_log, | ||
| client=self._client, | ||
| mode=ExecutionMode.SYNC, | ||
| container_name=container_name, | ||
| timestamp=message_timestamp, | ||
| pod=pod, | ||
| ) | ||
| self._log_message( | ||
| message_to_log, container_name, container_name_log_prefix_enabled, log_formatter | ||
| ) | ||
| last_captured_timestamp = message_timestamp | ||
| except TimeoutError as e: | ||
| # in case of timeout, increment return time by 2 seconds to avoid | ||
| # duplicate log entries | ||
| if val := (last_captured_timestamp or since_time): | ||
| return val.add(seconds=2), e | ||
| except HTTPError as e: | ||
| exception = e | ||
| self._http_error_timestamps = getattr(self, "_http_error_timestamps", []) | ||
| self._http_error_timestamps = [ | ||
| t for t in self._http_error_timestamps if t > utcnow() - timedelta(seconds=60) | ||
| ] | ||
| self._http_error_timestamps.append(utcnow()) | ||
| # Log only if more than 2 errors occurred in the last 60 seconds | ||
| if len(self._http_error_timestamps) > 2: | ||
| self.log.exception( | ||
| "Reading of logs interrupted for container %r; will retry.", | ||
| container_name, | ||
| ) | ||
| return last_captured_timestamp or since_time, exception | ||
|
|
||
| # note: `read_pod_logs` follows the logs, so we shouldn't necessarily *need* to | ||
| # loop as we do here. But in a long-running process we might temporarily lose connectivity. | ||
| # So the looping logic is there to let us resume following the logs. | ||
| last_log_time = since_time | ||
| while True: | ||
| last_log_time, exc = consume_logs(since_time=last_log_time) | ||
| if not self.container_is_running(pod, container_name=container_name): | ||
| return PodLoggingStatus(running=False, last_log_time=last_log_time) | ||
| if not follow: | ||
| return PodLoggingStatus(running=True, last_log_time=last_log_time) | ||
| # a timeout is a normal thing and we ignore it and resume following logs | ||
| if not isinstance(exc, TimeoutError): | ||
| self.log.warning( | ||
| "Pod %s log read interrupted but container %s still running. Logs generated in the last one second might get duplicated.", | ||
| pod.metadata.name, | ||
| container_name, | ||
| ) | ||
| time.sleep(1) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.